522 research outputs found

    On the Error Resilience of Ordered Binary Decision Diagrams

    Get PDF
    Ordered Binary Decision Diagrams (OBDDs) are a data structure that is used in an increasing number of fields of Computer Science (e.g., logic synthesis, program verification, data mining, bioinformatics, and data protection) for representing and manipulating discrete structures and Boolean functions. The purpose of this paper is to study the error resilience of OBDDs and to design a resilient version of this data structure, i.e., a self-repairing OBDD. In particular, we describe some strategies that make reduced ordered OBDDs resilient to errors in the indexes, that are associated to the input variables, or in the pointers (i.e., OBDD edges) of the nodes. These strategies exploit the inherent redundancy of the data structure, as well as the redundancy introduced by its efficient implementations. The solutions we propose allow the exact restoring of the original OBDD and are suitable to be applied to classical software packages for the manipulation of OBDDs currently in use. Another result of the paper is the definition of a new canonical OBDD model, called {\em Index-resilient Reduced OBDD}, which guarantees that a node with a faulty index has a reconstruction cost O(k)O(k), where kk is the number of nodes with corrupted index

    The Opportunity of Data-Driven Services for Viral Genomic Surveillance

    Get PDF
    The recent COVID-19 pandemic has posed novel challenges to the big data and knowledge management community. The unprecedented availability of viral genomes on public databases has made possible the data-driven exploration of viruses' evolution (especially of SARS-CoV-2, the virus responsible for the disease). Properties of data and knowledge in the genomic and virological domain may fuel data science methods for the identification and possible prediction of critical phenomena, such as the emergence of variants with improved transmissibility/virulence and recombined strains. A number of tools have been produced to explore the variants' trends or suggest hypotheses on the evolutionary mechanisms of the virus. In this perspective, we elaborate on plausible directions of this field of research, which are still applicable to the SARS-CoV-2 virus but may become even more relevant in the context of new outbreaks (e.g., monkeypox, malaria, diphtheria). Expressly, we point to 1) data-driven identification of mutations or variants with potential impact; 2) data-driven identification of recombination events - creating opportunities to overcome selective pressure and adapt to new environments and hosts (e.g., livestock or humans). These directions can be framed within genomic surveillance measures, characterized by the possibility of tracking viruses by using their genome, which is collected, sequenced, and submitted to public databases by laboratories around the world. If successful, genomic surveillance substantially supports the understanding of novel viral pathogens and of their dangerousness in terms of prevalence, infectivity, and transmissibility; the implemented services can be of great utility to decision-makers in healthcare. Here, we draw current trends, challenges, and future directions of data-driven services for genomic surveillance

    Conceptual models and databases for searching the genome

    Get PDF
    Genomics is an extremely complex domain, in terms of concepts, their relations, and their representations in data. This tutorial introduces the use of ER models in the context of genomic systems: conceptual models are of great help for simplifying this domain and making it actionable. We carry out a review of successful models presented in the literature for representing biologically relevant entities and grounding them in databases. We draw a difference between conceptual models that aim to explain the domain and conceptual models that aim to support database design and heterogeneous data integration. Genomic experiments and/or sequences are described by several metadata, specifying information on the sampled organism, the used technology, and the organizational process behind the experiment. Instead, we call data the actual regions of the genome that have been read by sequencing technologies and encoded into a machiner readable representation. First, we show how data and metadata can be modeled, then we exploit the proposed models for designing search systems, visualizers, and analysis environments. Both domains of human genomics and viral genomics are addressed, surveying several use cases and applications of broader public interest. The tutorial is relevant to the EDBT community because it demonstrates the usefulness of conceptual models’ principles within very current domains; in addition, it offers a concrete example of conceptual models’ use, setting the premises for interdisciplinary collaboration with a greater public (possibly including life science researchers)

    Searching COVID-19 clinical research using graphical abstracts

    Full text link
    Objective. Graphical abstracts are small graphs of concepts that visually summarize the main findings of scientific articles. While graphical abstracts are customarily used in scientific publications to anticipate and summarize their main results, we propose them as a means for expressing graph searches over existing literature. Materials and methods. We consider the COVID-19 Open Research Dataset (CORD-19), a corpus of more than one million abstracts; each of them is described as a graph of co-occurring ontological terms, selected from the Unified Medical Language System (UMLS) and the Ontology of Coronavirus Infectious Disease (CIDO). Graphical abstracts are also expressed as graphs of ontological terms, possibly augmented by utility terms describing their interactions (e.g., "associated with", "increases", "induces"). We build a co-occurrence network of concepts mentioned in the corpus; we then identify the best matches of graphical abstracts on the network. We exploit graph database technology and shortest-path queries. Results. We build a large co-occurrence network, consisting of 128,249 entities and 47,198,965 relationships. A well-designed interface allows users to explore the network by formulating or adapting queries in the form of an abstract; it produces a bibliography of publications, globally ranked; each publication is further associated with the specific parts of the abstract that it explains, thereby allowing the user to understand each aspect of the matching. Discussion and Conclusion. Our approach supports the process of scientific hypothesis formulation and evidence search; it can be reapplied to any scientific domain, although our mastering of UMLS makes it most suited to clinical domains.Comment: 12 pages, 6 figure

    Quantum Networks on Cubelike Graphs

    Full text link
    Cubelike graphs are the Cayley graphs of the elementary abelian group (Z_2)^n (e.g., the hypercube is a cubelike graph). We give conditions for perfect state transfer between two particles in quantum networks modeled by a large class of cubelike graphs. This generalizes results of Christandl et al. [Phys. Rev. Lett. 92, 187902 (2004)] and Facer et al. [Phys. Rev. A 92, 187902 (2008)].Comment: 5 pages, 2 eps figure

    Processing genome-wide association studies within a repository of heterogeneous genomic datasets

    Get PDF
    Background Genome Wide Association Studies (GWAS) are based on the observation of genome-wide sets of genetic variants – typically single-nucleotide polymorphisms (SNPs) – in different individuals that are associated with phenotypic traits. Research efforts have so far been directed to improving GWAS techniques rather than on making the results of GWAS interoperable with other genomic signals; this is currently hindered by the use of heterogeneous formats and uncoordinated experiment descriptions. Results To practically facilitate integrative use, we propose to include GWAS datasets within the META-BASE repository, exploiting an integration pipeline previously studied for other genomic datasets that includes several heterogeneous data types in the same format, queryable from the same systems. We represent GWAS SNPs and metadata by means of the Genomic Data Model and include metadata within a relational representation by extending the Genomic Conceptual Model with a dedicated view. To further reduce the gap with the descriptions of other signals in the repository of genomic datasets, we perform a semantic annotation of phenotypic traits. Our pipeline is demonstrated using two important data sources, initially organized according to different data models: the NHGRI-EBI GWAS Catalog and FinnGen (University of Helsinki). The integration effort finally allows us to use these datasets within multisample processing queries that respond to important biological questions. These are then made usable for multi-omic studies together with, e.g., somatic and reference mutation data, genomic annotations, epigenetic signals. Conclusions As a result of our work on GWAS datasets, we enable 1) their interoperable use with several other homogenized and processed genomic datasets in the context of the META-BASE repository; 2) their big data processing by means of the GenoMetric Query Language and associated system. Future large-scale tertiary data analysis may extensively benefit from the addition of GWAS results to inform several different downstream analysis workflows

    Biconditional-BDD Ordering for Autosymmetric Functions

    Get PDF
    Autosymmetric functions are particular ``regular'' Boolean functions that are exploited for logic optimization, since it is possible to reduce the number of variables and the number of points of the original autosymmetric function before its synthesis. In this paper we study this regularity in oder to derive a suitable variable ordering for Biconditional Binary Decision Diagrams (BBDDs). BBDDs are a new version of BDD that have EXOR of two variables (instead of a variable) in the nodes. These diagrams are employed for logic synthesis in new technologies such as silicon nanowires and DG-SiNWFETs. We show that it is possible to find a useful variable ordering for these functions and the experimental results validate our approach showing that in the 97% of the cases we get an ordering that gives a number of nodes that is lower or equal to the one obtained with the standard ordering

    Exploring the evolution of research topics during the COVID-19 pandemic

    Full text link
    The COVID-19 pandemic has changed the research agendas of most scientific communities, resulting in an overwhelming production of research articles in a variety of domains, including medicine, virology, epidemiology, economy, psychology, and so on. Several open-access corpora and literature hubs were established; among them, the COVID-19 Open Research Dataset (CORD-19) has systematically gathered scientific contributions for 2.5 years, by collecting and indexing over one million articles. Here, we present the CORD-19 Topic Visualizer (CORToViz), a method and associated visualization tool for inspecting the CORD-19 textual corpus of scientific abstracts. Our method is based upon a careful selection of up-to-date technologies (including large language models), resulting in an architecture for clustering articles along orthogonal dimensions and extraction techniques for temporal topic mining. Topic inspection is supported by an interactive dashboard, providing fast, one-click visualization of topic contents as word clouds and topic trends as time series, equipped with easy-to-drive statistical testing for analyzing the significance of topic emergence along arbitrarily selected time windows. The processes of data preparation and results visualization are completely general and virtually applicable to any corpus of textual documents - thus suited for effective adaptation to other contexts.Comment: 16 pages, 6 figures, 1 tabl
    • …
    corecore